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<N ■ 1 Introduction 
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D ■ Finding the largest clique is a notoriously hard problem, even on random graphs. It is known 

£^3 , that the clique number of a random graph G(n, 1/2) is almost surely either k or k + 1, where 

CN \ k = [~21ogra — 2 log log n — 1] (Section 4.5 in [TJ, also [2]). However, a simple greedy algorithm 

finds a clique of size only logn (1 + o(l)), with high probability, and finding larger cliques - that 
of size even (1 + e) logn - in randomized polynomial time has been a long-standing open problem 
[3]. In this paper, we study the following generalization: given a random graph G(n, 1/2) find the 
largest subgraph with edge density at least (1 — 5). We show that a simple modification of the 



(N 

Q 



Y^ greedy algorithm finds a subset of 2 log n vertices whose induced subgraph has edge density at least 

0.951, with high probability. To complement this, we show that almost surely there is no subset of 



(N 



2.784 logn vertices whose induced subgraph has edge density 0.951 or more. 

We use G(n,p) to denote a random graph on n vertices where each pair of vertices appears as 
an edge independently with probability p. We use V to denote its set of vertices and E to denote 
its set of edges. Moreover, given two subsets S C V and T C V, we use E(S, T) to denote the set 

I/"") ■ of edges with one endpoint in S and another endpoint in T. The density of the subgraph induced 

r — , ■ by vertices in S is given by 

o: , ., «n \e(s,s)\ 

OO : density (5) = |g| ■ 

O ■ \ 2 ) 

Therefore, the expected density of G(n, 1/2) is 1/2 and the density of any clique is 1. 

In Section [2] we describe our algorithm for finding subgraphs of density 1 — 5. We give a bound 
$h ' on the largest subgraph of density 1 — 5 in the following Section [3j Finally, in Section SJ we present 

some open problems. 

2 Algorithm for finding large subgraph of density 1 — 8 

In this section, we describe our algorithm and give a relationship between the size of the subgraph 
obtained by the algorithm, and its density. In particular, we show that the algorithm can be used 
to obtain a subset of 2 logn vertices of density 0.951, with high probability. 
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Greedy Algorithm to pick a 


DENSE SUBGRAPH: 








Input: a random graph G(n, 1/2) 


and 5 > 0. 








Output: a subset S C V of size fc 


= 21ogn. 








1. Partition the vertices into disjoint sets V = V\ U V2 U • • 


•uy fc , 


each of size 


n/k. 


2. Initialize S"o = 0. 










3. For i = to k — 1 do: 










(a) Pick t>j + i 6 Vf-f-i that has the maximum number of edges 


to Si, i.e., 






Ui+i = argmax \E(v i+ i 
«ev l+ i 


$)!■ 






(b) 5j + i <- 5, U {fj+i}. 










4. Return 5 = Sfc-i. 











Notice that the algorithm first partitions all nodes into k random subsets of the same size, 
and then picks one vertex from each partition. This partitioning is necessary to argue about 
independence in our analysis of choosing vertices greedily. 

In the analysis below, H(S) is the standard notation of the Shannon entropy function, which is 
— (5 log 5 + (1 — 6) log(l — 5)). The following lemma gives a lower bound on the number of edges 
we can expect to add to our subgraph, for the i-th vertex added by the algorithm. 

Lemma 2.1. For any < i < k and <5j that satisfies 

H(S i )>l--log(-—^ -), 

1 \ 2k In (log n) J 

we have 

Pr (\E(v l+1 , Si)\ >{l-8i)i)>l- -\-. 

log n 

Proof. We know by the previous results, that as long as k < logra, the vertex added has all edges 
to Sk-i- Consider k > logn. The algorithm has j vertices to choose from. The expected number 
of vertices among these, with at least (1 — 5k)k vertices is given by, 

Fix v € Vi + \. The probability that v has at least (1 — 5i)i edges to S{ is 

P r (\E(v,S i )\>(l-S i )i)= J2 (j)2- = 2(^)+°( 1 )- 1 )\ 

t=(l-5i)i ^ ' 
where H(5) = — 5log5 — (1 — <5)log(l — 5) is the Shannon entropy (here log is taken with base 2). 



Using independence of these events for different v G Vi+i, we get 

Pr (\E(v, S t )\ < (1 - S t )i, W G U m ) < (l - 2( ff W-^ 



2/c ln(log n) 



< 



n 

1 



n/k 



log 2 n 



Therefore, 



Pr(|E( Ui+ i,5i)| > (1-*)*) > 1- —s-. 

log n 



We now give a union bound over all k additions of vertices, using the previous lemma. 

Lemma 2.2. 

/ fc-i \ 

1 as n — > oo. 



Pr(\E{S,S)\>Y,^-5i)i\ 



Proof. Since V\, Vz, . . . , Vk are disjoint, using independence and Lemma |2"TT1 we get 

(fe-i \ fe-i 

|S(5,5)| > J2 C 1 -*)* > II Pr (l-^Cvi+i, 50l>(l-<5i)i) 
j=0 / i=0 

1 \ fc-1 
> II 



log n 



> e i/iogn using fc = 21ogn 



□ 



□ 



The point is that we are picking exactly one vertex from each vertex set/partition, and hence 
do not lose any randomness or independence of the edges. This now gives us a bound on the 
minimum number of edges one can expect, w.h.p., in the chosen set of k vertices. We are not able 
to express, in a closed form, the size of a subgraph obtainable using this algorithm for a specific 
density. Therefore, we state the best density one can guarantee w.h.p. for k = 21ogn. This is 
stated as a theorem below, which we prove subsequently. 



Theorem 2.3. Our algorithm produces a subset S QV of size k = 21ogn such that density (S) 
0.951, almost surely. 



Proof. From Lemma 12.21 we have that, almost surely, 

fc-i 
\E(S,S)\>Y,0--6i)i 

>y(i-H- l (\-- log ( — I — , 

i=0 i=logm 

*) - £ «-' (» - T) • w 

i=logm 

where m = n/2k\n(logn). Here we use the fact that we can choose Si = for the first logm steps. 
Now let k — 1 = (1 + a) log m. Then 

fc-i 



i=log m 

(1+a) logm 



i=log m 
a logm 



E^-'fi-^T) 

a) logm . . 

E «-'-^ 

=log m 

^ V log?n + ty 

= log 2 mf;(l+x)F- 1 (l- T l-) 

<log 2 m/ (l + sJjJ-Ml J dx, 



Now using Equations ([I]) and ([2]) we have 
density (5) = ' V fc ;I 



i-|(i + << i))jr(i + »)ir'(i- T l i ) 



(2) 



using 



> 0.951. 



A; 21ogn , , 

a = i ! = \ i 777^ T^ vT -1 = 1 + o(l). 

logm log?i — log (41ogn • In(lograj) 



(2) 



and computing an upper bound on the integral numerically. □ 



3 Upper bound on largest subgraph of density 1 — 5 

In this section, we upper bound the size of the largest subgraph of density 1 — 5 in G(n, 1/2). 

Theorem 3.1. A random graph G(n, 1/2) has no subgraph of size 

2 log n + 2 log e 
l-H(8)-o(l) + 

and density at least 1 — 5, almost surely. In particular, there is no subgraph of size 2.784 log n and 
density at least 0.951, almost surely. 

Proof. For every 5C]/of size k, define an indicator random variable X$ as follows. 

1 if S induces a subgraph of density > 1 — 5 



X s 



Thus 



otherwise. 



(2) 



E[x s }= J2 (^rf^^M 1 '" 1 '!'). 
i=(i-*)(5) 

By linearity of expectation, the expected number of subgraphs of size k and density at least 1 — 5 



is 



E x s 

S: \S\=k 



S: \S\=k 

en_ 2{ H(6) + o(i)-i)^\ k 
k ) 

' 9 (l-H(5)-o(l))| , A fc 

^, 2 (^) + o(l)-l)^i ] 



' 2 (i-H(j)+o(i))/2y 



0, as ?i — > oo, 



using 



2 log n + 2 log e 

« = ttt^-, 7TT + 1. 



l-H(5)-o(l) 



Therefore, by Markov inequality we have 



Pr 



,S : \S\=k 



X S > 1 < E 



S : \S\=k 



X S 



0. 



as n — > oo. Or in other words, almost surely there is no subset of k vertices that induce a subgraph 
of density at least 1 — 5. □ 



Notice that for density 0.951, the gap/ratio between the largest subgraph that exists and the 
largest subgraph that we can find is smaller than in the case of cliques. This is interesting, although 
not entirely unexpected as for density 0.5, the whole graph can be output. This ratio for density 
0.951 is however significantly smaller than 2; it is 2.784/2 = 1.392. 

4 Conclusions 

For a concrete open problem, is there a polynomial time algorithm that outputs a subgraph of 
density 1 — e and size 2 log n for any choice of e > ? 

Are there simple algorithms that beat the density bound of 0.95 for subgraphs of size 21ogn. 
Is there an 0(n gn ) time algorithm that finds the largest clique in G(n, 1/2)? If not, what is the 
maximum density obtainable for a subgraph of size 2 log n? Spectral techniques could be tried. 
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